2 research outputs found
Towards Harmful Erotic Content Detection through Coreference-Driven Contextual Analysis
Adult content detection still poses a great challenge for automation.
Existing classifiers primarily focus on distinguishing between erotic and
non-erotic texts. However, they often need more nuance in assessing the
potential harm. Unfortunately, the content of this nature falls beyond the
reach of generative models due to its potentially harmful nature. Ethical
restrictions prohibit large language models (LLMs) from analyzing and
classifying harmful erotics, let alone generating them to create synthetic
datasets for other neural models. In such instances where data is scarce and
challenging, a thorough analysis of the structure of such texts rather than a
large model may offer a viable solution. Especially given that harmful erotic
narratives, despite appearing similar to harmless ones, usually reveal their
harmful nature first through contextual information hidden in the non-sexual
parts of the narrative.
This paper introduces a hybrid neural and rule-based context-aware system
that leverages coreference resolution to identify harmful contextual cues in
erotic content. Collaborating with professional moderators, we compiled a
dataset and developed a classifier capable of distinguishing harmful from
non-harmful erotic content. Our hybrid model, tested on Polish text,
demonstrates a promising accuracy of 84% and a recall of 80%. Models based on
RoBERTa and Longformer without explicit usage of coreference chains achieved
significantly weaker results, underscoring the importance of coreference
resolution in detecting such nuanced content as harmful erotics. This approach
also offers the potential for enhanced visual explainability, supporting
moderators in evaluating predictions and taking necessary actions to address
harmful content.Comment: Accepted for 6th Workshop on Computational Models of Reference,
Anaphora and Coreference at EMNLP 2023 Conferenc
BAN-PL: a Novel Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service
Advances in automated detection of offensive language online, including hate
speech and cyberbullying, require improved access to publicly available
datasets comprising social media content. In this paper, we introduce BAN-PL,
the first open dataset in the Polish language that encompasses texts flagged as
harmful and subsequently removed by professional moderators. The dataset
encompasses a total of 691,662 pieces of content from a popular social
networking service, Wykop.pl, often referred to as the "Polish Reddit",
including both posts and comments, and is evenly distributed into two distinct
classes: "harmful" and "neutral". We provide a comprehensive description of the
data collection and preprocessing procedures, as well as highlight the
linguistic specificity of the data. The BAN-PL dataset, along with advanced
preprocessing scripts for, i.a., unmasking profanities, will be publicly
available